This document carries out all analysis for the Utah Aquatic Insect Decline project. Input files are:
longterm_site_richness.csv: yearly average richness and abundance data for each site with at least 4 years of data (n = 142 sites)
longterm_comid_richness.csv: yearly average richness and abundance data for each COMID segment with at least 4 years of data (n = 211 COMIDs)
taxa_densities.csv: abundance and density info for each taxa in each sample (pre-richness calculations) - used for investigating specific taxa of interest.
Sections are:
By Site: Checking distribution, examining trends in richness & density over time (site-by-site & overall)
By COMID: Checking distribution, examining trends in richness & density over time (site-by-site & overall)
Pteronarcys californica
Zapada
Read in datafiles:
site_inverts <- read.csv("cleaned_data/longterm_site_richness.csv")
comid_inverts <- read.csv("cleaned_data/longterm_comid_richness.csv")
taxa_densities <-read.csv("cleaned_data/taxa_densities.csv")%>%
mutate(sampleDate =ymd(sampleDate))
Richness:
ggplot(site_inverts, aes(x = richness_xbar))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "Mean Richness", y = "Count")
Data are fairly normally distributed, maybe slightly bimodal.
Density:
ggplot(site_inverts, aes(x = density_xbar))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "Mean Total Density", y = "Count")
Strongly left-skewed - try a log transform:
ggplot(site_inverts, aes(x = log10(density_xbar)))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "log10(Mean Total Density)", y = "Count")
Better, still slightly skewed - will use log transform for all density plots for now.
Visualization:
site.trends <- ggplot(site_inverts, aes(y = richness_xbar, x = yr, color = factor(siteId)))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
facet_wrap(~factor(siteId), scales = "free")+
theme(legend.position = "none",
strip.text = element_blank(),
strip.background = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())+
labs(x = "Year",
y = "Mean Richness")
site.trends
## `geom_smooth()` using formula 'y ~ x'
A few initial observations:
No consistent trend across all sites - some look flat, some increase, some decrease.
Seems like most sites with longer-term data (more points) generally have increasing trends.
Very clear trends (+ or -) at some sites, others are more scattered.
Would be cool to see if there’s some other variable (location, elevation, land use, etc) that explains different trends @ different sites (i.e. why some are increasing, others are decreasing)
Save:
ggsave("plots/site_richness_trends.jpg", site.trends, device = "jpeg", height = 8.5, width = 8.5, units = "in", dpi = "retina")
## `geom_smooth()` using formula 'y ~ x'
Same visualization as above, but color coded by elevation:
site.elev.trends <- ggplot(site_inverts, aes(y = richness_xbar, x = yr, color = elev_m))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
facet_wrap(~factor(elev_m), scales = "free")+
theme(legend.position = "bottom",
strip.text = element_blank(),
strip.background = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
legend.text = element_text(angle = 45, vjust = 1, hjust = 1))+
labs(x = "Year",
y = "Mean Richness", color = "Elevation, m")
site.elev.trends
## `geom_smooth()` using formula 'y ~ x'
Doesn’t seem to be any consistent pattern across elevations - have both increasing and decreasing trends at all elevations.
Visualization:
site.overall <- ggplot(filter(site_inverts, yr >=1990), aes(y = richness_xbar, x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "Mean Richness")
site.overall
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
site.overall.lm <- lm(richness_xbar~yr, filter(site_inverts, yr >1990))
summary(site.overall.lm)
##
## Call:
## lm(formula = richness_xbar ~ yr, data = filter(site_inverts,
## yr > 1990))
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.3941 -8.5667 -0.1311 7.8549 26.8409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -657.66204 109.48871 -6.007 2.80e-09 ***
## yr 0.33954 0.05453 6.227 7.47e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.46 on 854 degrees of freedom
## Multiple R-squared: 0.04343, Adjusted R-squared: 0.04231
## F-statistic: 38.77 on 1 and 854 DF, p-value: 7.467e-10
Significant increase in richness over time across all sites.
Calculate richness for each year (average all sites):
yr_avgs <- site_inverts %>%
group_by(yr)%>% #group by year
summarise(richness_xbar = mean(richness_xbar), #calculate mean richness
abund_xbar = mean(abund_xbar), #calculate mean abundance
density_xbar= mean(density_xbar), #calculate mean density
n_sites = length(unique(siteId))) #count # of sites per year
Visualization:
yr.overall <- ggplot(filter(yr_avgs, yr >=1990), aes(y = richness_xbar, x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = T, size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "Mean Richness")
yr.overall
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
yr.overall.lm <- lm(richness_xbar~yr, filter(yr_avgs, yr >=1990))
summary(yr.overall.lm)
##
## Call:
## lm(formula = richness_xbar ~ yr, data = filter(yr_avgs, yr >=
## 1990))
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.539 -2.931 1.225 4.179 8.654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -910.6456 219.0868 -4.157 0.000248 ***
## yr 0.4643 0.1091 4.254 0.000189 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.733 on 30 degrees of freedom
## Multiple R-squared: 0.3762, Adjusted R-squared: 0.3555
## F-statistic: 18.1 on 1 and 30 DF, p-value: 0.0001892
Significant increase in overall mean richness over time
Visualization (color coded & sorted by elevation):
site.elev.dens <- ggplot(site_inverts, aes(y = log10(density_xbar), x = yr, color = elev_m))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
facet_wrap(~factor(elev_m), scales = "free")+
theme(legend.position = "bottom",
strip.text = element_blank(),
strip.background = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
legend.text = element_text(angle = 45, vjust = 1, hjust = 1))+
labs(x = "Year",
y = "log10(Mean Total Density)", color = "Elevation, m")
site.elev.dens
## `geom_smooth()` using formula 'y ~ x'
Similar to richness - some sites increasing, others decreasing. No obvious/consistent patterns across elevations.
Visualization:
site.overall.dens <- ggplot(filter(site_inverts, yr >=1990), aes(y = log10(density_xbar), x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "log10(Mean Total Density)")
site.overall.dens
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
site.dens.lm <- lm(log10(density_xbar)~yr, filter(site_inverts, yr >1990))
summary(site.dens.lm)
##
## Call:
## lm(formula = log10(density_xbar) ~ yr, data = filter(site_inverts,
## yr > 1990))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2557 -0.3913 0.1571 0.4753 1.4757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.710619 7.201735 0.932 0.352
## yr -0.001663 0.003587 -0.464 0.643
##
## Residual standard error: 0.6882 on 854 degrees of freedom
## Multiple R-squared: 0.0002517, Adjusted R-squared: -0.0009189
## F-statistic: 0.215 on 1 and 854 DF, p-value: 0.643
No change in density over time when using log transform
Visualization:
dens.overall <- ggplot(filter(yr_avgs, yr >=1990), aes(y = log10(density_xbar), x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = T, size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "log10(Mean Total Density)")
dens.overall
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
dens.overall.lm <- lm(log10(density_xbar)~yr, filter(yr_avgs, yr >=1990))
summary(dens.overall.lm)
##
## Call:
## lm(formula = log10(density_xbar) ~ yr, data = filter(yr_avgs,
## yr >= 1990))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.53659 -0.06689 0.08403 0.15879 0.49567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.645814 14.403394 0.184 0.855
## yr 0.000484 0.007175 0.067 0.947
##
## Residual standard error: 0.3769 on 30 degrees of freedom
## Multiple R-squared: 0.0001517, Adjusted R-squared: -0.03318
## F-statistic: 0.004551 on 1 and 30 DF, p-value: 0.9467
No change in total density over time.
Richness:
ggplot(comid_inverts, aes(x = richness_xbar))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "Mean Richness", y = "Count")
Same as above, might be slightly more bimodal.
Density:
ggplot(comid_inverts, aes(x = density_xbar))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "Mean Total Density", y = "Count")
Still very left-skewed; use a log transform:
ggplot(comid_inverts, aes(x = log10(density_xbar)))+
geom_histogram(fill = "Grey", color = "Black", bins = 70)+
theme_classic()+
labs(x = "log10(Mean Total Density)", y = "Count")
Visualization:
comid.trends <- ggplot(comid_inverts, aes(y = richness_xbar, x = yr, color = factor(COMID)))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
facet_wrap(~factor(COMID), scales = "free")+
theme(legend.position = "none",
strip.text = element_blank(),
strip.background = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())+
labs(x = "Year",
y = "Mean Richness")
comid.trends
## `geom_smooth()` using formula 'y ~ x'
No obvious differences from site-by-site data - both increasing and decreasing trends at different sites.
Save:
ggsave("plots/comid_richness_trends.jpg", comid.trends, device = "jpeg", height = 8.5, width = 8.5, units = "in", dpi = "retina")
## `geom_smooth()` using formula 'y ~ x'
Visualization:
comid.overall <- ggplot(filter(comid_inverts, yr >=1990), aes(y = richness_xbar, x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "Mean Richness")
comid.overall
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
comid.overall.lm <- lm(richness_xbar~yr, filter(comid_inverts, yr >=1990))
summary(comid.overall.lm)
##
## Call:
## lm(formula = richness_xbar ~ yr, data = filter(comid_inverts,
## yr >= 1990))
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.3941 -8.5667 -0.1311 7.8549 26.8409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -657.66204 109.48871 -6.007 2.80e-09 ***
## yr 0.33954 0.05453 6.227 7.47e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.46 on 854 degrees of freedom
## Multiple R-squared: 0.04343, Adjusted R-squared: 0.04231
## F-statistic: 38.77 on 1 and 854 DF, p-value: 7.467e-10
Significant increase in richness across all COMID’s (same trend as site-by-site data)
Calculate yearly average richness, abundance, & density:
yr_comid_avgs <- comid_inverts %>%
filter(!yr <1990)%>% #remove obs from the 60's
group_by(yr)%>% #group by year
summarise(
richness_xbar = mean(richness_xbar), #calculate avg richness, abundance, and density across all sites for each year
abund_xbar = mean(abund_xbar),
density_xbar = mean(density_xbar)
)
Visualization:
comid.yr.overall <- ggplot(yr_comid_avgs, aes(y = richness_xbar, x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = T, size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "Mean Richness")
comid.yr.overall
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
comid.yr.lm <- lm(richness_xbar~yr, yr_comid_avgs)
summary(comid.yr.lm)
##
## Call:
## lm(formula = richness_xbar ~ yr, data = yr_comid_avgs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.539 -2.931 1.225 4.179 8.654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -910.6456 219.0868 -4.157 0.000248 ***
## yr 0.4643 0.1091 4.254 0.000189 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.733 on 30 degrees of freedom
## Multiple R-squared: 0.3762, Adjusted R-squared: 0.3555
## F-statistic: 18.1 on 1 and 30 DF, p-value: 0.0001892
Significant increase in richness across all COMIDs over time (same result as site-by-site data)
Visualization:
comid.density <- ggplot(comid_inverts, aes(y = log10(density_xbar), x = yr, color = factor(COMID)))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
facet_wrap(~factor(COMID), scales = "free")+
theme(legend.position = "none",
strip.text = element_blank(),
strip.background = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())+
labs(x = "Year",
y = "log10(Mean Total Density)")
comid.density
## `geom_smooth()` using formula 'y ~ x'
No obvious trends stand out - variety of responses at different COMIDs.
Visualization:
comid.dens.overall <- ggplot(filter(comid_inverts, yr >=1990), aes(y = log10(density_xbar), x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5)+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "log10(Mean Total Density)")
comid.dens.overall
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
comid.dens.lm <- lm(log10(density_xbar)~yr, comid_inverts)
summary(comid.dens.lm)
##
## Call:
## lm(formula = log10(density_xbar) ~ yr, data = comid_inverts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2419 -0.3869 0.1537 0.4711 1.4927
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5367087 5.5189154 0.641 0.522
## yr -0.0000832 0.0027496 -0.030 0.976
##
## Residual standard error: 0.6851 on 867 degrees of freedom
## Multiple R-squared: 1.056e-06, Adjusted R-squared: -0.001152
## F-statistic: 0.0009157 on 1 and 867 DF, p-value: 0.9759
No change in density over time (same finding as site-by-site)
Visualization:
comid.yr.dens <- ggplot(yr_comid_avgs, aes(y = log10(density_xbar), x = yr))+
geom_point(size = 0.5)+
geom_smooth(se = T, size = 0.5)+
geom_smooth(se = F, method = "lm", size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
theme(legend.position = "none",
axis.text = element_text(),
)+
labs(x = "Year",
y = "log10(Mean Total Density)")
comid.yr.dens
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
comid.yr.dens <- lm(log10(density_xbar)~yr, yr_comid_avgs)
summary(comid.yr.dens)
##
## Call:
## lm(formula = log10(density_xbar) ~ yr, data = yr_comid_avgs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.53659 -0.06689 0.08403 0.15879 0.49567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.645814 14.403394 0.184 0.855
## yr 0.000484 0.007175 0.067 0.947
##
## Residual standard error: 0.3769 on 30 degrees of freedom
## Multiple R-squared: 0.0001517, Adjusted R-squared: -0.03318
## F-statistic: 0.004551 on 1 and 30 DF, p-value: 0.9467
No change in density over time (same finding as site-by-site)
First, extract all observations of P. californica into a new dataframe:
p_cal <- taxa_densities%>%
filter(scientificName == "Pteronarcys californica",
!splitCount == 0) #dropping samples where 0 individuals were counted
Visualization:
ggplot(p_cal, aes(x = sampleDate, y = taxa_density))+
geom_point(size = 0.5)+
geom_smooth(size = 0.5)+
geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
theme_classic()+
labs(x = "Sample Date", y = "P. Californica Density (individuals/m2)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
pcal.dens.lm <- lm(taxa_density~sampleDate, p_cal)
summary(pcal.dens.lm)
##
## Call:
## lm(formula = taxa_density ~ sampleDate, data = p_cal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.26 -28.68 -10.78 13.54 205.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 109.765792 49.878237 2.201 0.0329 *
## sampleDate -0.004859 0.003462 -1.403 0.1674
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 45.62 on 45 degrees of freedom
## Multiple R-squared: 0.04193, Adjusted R-squared: 0.02064
## F-statistic: 1.969 on 1 and 45 DF, p-value: 0.1674
Non-significant decrease in P. californica density over time (ignoring effects of site, etc.)
Calculate # unique sites with P. calif. each year:
pCal_yrs <- p_cal%>%
mutate(yr = year(sampleDate))%>% #create new column with year
group_by(yr)%>% #group by year
summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
n_sites = length(unique(siteId)),
tot_abund = sum(splitCount))
Add total sites each year and calculate % of sites w/ P. cali.:
# Calculate total sites per year:
sites <- taxa_densities%>%
mutate(yr = year(sampleDate))%>% #Add year colum
group_by(yr)%>% #group by year
summarise(total_sites = length(unique(siteId))) #count # unique siteId's per year
#Merge with P cali data, calculate % sites with P. calif. each year:
pCal_yrs <- merge(pCal_yrs, sites)%>%
mutate(site_perc = (n_sites/total_sites)*100)
Visualization:
ggplot(pCal_yrs, aes(x = yr, y = site_perc))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
theme_classic()+
labs(x = "Year", y = "% of sites with P. californica present")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Basic LM:
pcal.site.lm <- lm(site_perc~yr, pCal_yrs)
summary(pcal.site.lm)
##
## Call:
## lm(formula = site_perc ~ yr, data = pCal_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9849 -0.6676 -0.2487 0.5601 1.9133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39.88790 66.07993 -0.604 0.556
## yr 0.02061 0.03289 0.627 0.542
##
## Residual standard error: 0.8528 on 13 degrees of freedom
## Multiple R-squared: 0.02933, Adjusted R-squared: -0.04533
## F-statistic: 0.3928 on 1 and 13 DF, p-value: 0.5417
No change in # sites with P. californica present over time.
Visualization:
ggplot(pCal_yrs, aes(x = yr, y = tot_abund))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
theme_classic()+
labs(x = "Year", y = "Total P. californica abundance")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
No obvious trends in total abundance over time.
ggplot(pCal_yrs, aes(x = yr, y = density_xbar))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
annotate(geom = "label", x = 2019, y = 80, label = "P = 0.035; R2 = 0.30")+
theme_classic()+
labs(x = "Year", y = "Average P. californica density (# individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Basic LM:
pcal.dens.yr <- lm(density_xbar~yr, pCal_yrs)
summary(pcal.dens.yr)
##
## Call:
## lm(formula = density_xbar ~ yr, data = pCal_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.463 -16.434 8.193 12.534 43.442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4260.4680 1805.1872 2.36 0.0346 *
## yr -2.1028 0.8985 -2.34 0.0359 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.3 on 13 degrees of freedom
## Multiple R-squared: 0.2965, Adjusted R-squared: 0.2423
## F-statistic: 5.478 on 1 and 13 DF, p-value: 0.03586
Significant decrease in mean density over time - When averaged across all sites with P. californica observations, density (# individuals/m2) has decreased between 2000 and 2023.
First, extract all observations of Zapada into a new dataframe:
zapada <- taxa_densities%>%
filter(sampleDate > "1990-01-01")%>% #dropping observations before 1990
filter(genus == "Zapada",
!splitCount == 0) #dropping samples where 0 individuals were counted
Check Distribution:
ggplot(zapada, aes(x = log10(taxa_density)))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Strong left skew - use log transform.
Visualization:
ggplot(zapada, aes(x = sampleDate, y = log10(taxa_density)))+
geom_point(size = 0.5)+
geom_smooth(size = 0.5)+
geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
theme_classic()+
labs(x = "Sample Date", y = "log10(Zapada Density (individuals/m2))")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
zap.dens.lm <- lm(log10(taxa_density)~sampleDate, zapada)
summary(zap.dens.lm)
##
## Call:
## lm(formula = log10(taxa_density) ~ sampleDate, data = zapada)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.78419 -0.55566 -0.02905 0.54080 2.24495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.686e+00 1.053e-01 25.50 <2e-16 ***
## sampleDate -7.344e-05 7.247e-06 -10.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7452 on 1915 degrees of freedom
## Multiple R-squared: 0.05091, Adjusted R-squared: 0.05041
## F-statistic: 102.7 on 1 and 1915 DF, p-value: < 2.2e-16
Significant decrease in P. californica density over time (ignoring effects of site, etc.)
Calculate # unique sites with Zapada each year:
zapada_yrs <- zapada%>%
mutate(yr = year(sampleDate))%>% #create new column with year
group_by(yr)%>% #group by year
summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
n_sites = length(unique(siteId)),
tot_abund = sum(splitCount))
Add total # sites per year and calculate %:
zapada_yrs <- merge(zapada_yrs, sites)%>%
mutate(site_perc = (n_sites/total_sites)*100)
Visualization:
ggplot(zapada_yrs, aes(x = yr, y = site_perc))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
theme_classic()+
labs(x = "Year", y = "Number of sites with Zapada present")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Basic LM:
zap.site.lm <- lm(site_perc~yr, zapada_yrs)
summary(zap.site.lm)
##
## Call:
## lm(formula = site_perc ~ yr, data = zapada_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.7708 -9.1133 0.4998 5.1884 30.8810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -221.2326 527.2285 -0.420 0.678
## yr 0.1236 0.2626 0.471 0.642
##
## Residual standard error: 11.99 on 27 degrees of freedom
## Multiple R-squared: 0.008141, Adjusted R-squared: -0.02859
## F-statistic: 0.2216 on 1 and 27 DF, p-value: 0.6416
No change in % of sites with Zapada present over time.
Visualization:
ggplot(zapada_yrs, aes(x = yr, y = tot_abund))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
theme_classic()+
labs(x = "Year", y = "Total Zapada abundance")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Basic LM:
zap.abund.lm <- lm(tot_abund~yr, zapada_yrs)
summary(zap.abund.lm)
##
## Call:
## lm(formula = tot_abund ~ yr, data = zapada_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1270.2 -629.7 -479.6 515.6 3573.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -36828.23 49341.11 -0.746 0.462
## yr 18.90 24.57 0.769 0.448
##
## Residual standard error: 1122 on 27 degrees of freedom
## Multiple R-squared: 0.02145, Adjusted R-squared: -0.0148
## F-statistic: 0.5917 on 1 and 27 DF, p-value: 0.4484
No change in Zapada abundance over time.
ggplot(zapada_yrs, aes(x = yr, y = density_xbar))+
geom_point()+
geom_smooth(method = lm, se = F, size = 0.5, linetype = "dashed", color = "Red")+
geom_smooth(size = 0.5)+
annotate(geom = "label", x = 2019, y = 750, label = "P = 0.036, R2 = 0.15")+
theme_classic()+
labs(x = "Year", y = "Average Zapada density (# individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Basic LM:
zap.dens.yr <- lm(density_xbar~yr, zapada_yrs)
summary(zap.dens.yr)
##
## Call:
## lm(formula = density_xbar ~ yr, data = zapada_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -305.25 -72.70 -10.54 17.09 580.20
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16525.295 7405.867 2.231 0.0342 *
## yr -8.133 3.688 -2.205 0.0361 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 168.4 on 27 degrees of freedom
## Multiple R-squared: 0.1526, Adjusted R-squared: 0.1212
## F-statistic: 4.863 on 1 and 27 DF, p-value: 0.03615
Significant decrease in mean density over time - When averaged across all sites with Zapada observations, density (# individuals/m2) has decreased between 2000 and 2023.
First, select all sites with elevation >=3000m from zapada dataframe:
zapada_hi <- zapada %>%
filter(elev_m >=3000)
length(unique(zapada_hi$siteId))
## [1] 28
Only 3 sites above 3000m - probably can’t do much with that.
Alternative: identify 10 highest sites, extract data from only those sites:
zap_hiSites <- zapada %>%
select(siteId, siteLongitude, siteLatitude,elev_m)%>% #select site and elevation cols
distinct()%>% #remove duplicates
arrange(desc(elev_m))%>% #sort by elevation, largest to smallest
slice(1:10) #extract first 10 sites
zap_hi10 <- zapada %>%
filter(siteId%in%zap_hiSites$siteId) #create new dataframe with Zapada data from 10 highest sites
Check distribution of density at these sites:
ggplot(zap_hi10, aes(x = log10(taxa_density)))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Strong left skew, use log transform.
Is density changing at these sites?
ggplot(zap_hi10, aes(x = sampleDate, y = log10(taxa_density)))+
geom_point(size = 0.5)+
#geom_smooth(size = 0.5)+
geom_smooth(se = F, method = "lm", color = "Red", linetype = "dashed")+
theme_classic()+
labs(x = "Sample Date", y = "Zapada Density (individuals/m2)")
## `geom_smooth()` using formula 'y ~ x'
No clear trends.
Are there trends in prevalence, total abundance, or mean density over time?
Calculate number of sites w/ high elevation Zapada per year, mean density and total abundance per year:
zapHi_yrs <- zap_hi10%>%
mutate(yr = year(sampleDate))%>% #create new column with year
group_by(yr)%>% #group by year
summarise(density_xbar = mean(taxa_density), #calculate mean density, # sites with P. Cal., and total abundance for each year
n_sites = length(unique(siteId)),
tot_abund = sum(splitCount))
Add total sites each year and calculate % of sites w/ Zapada:
zapHi_yrs_pivot <- merge(zapHi_yrs, sites)%>% #merge with site numbers calculated above
mutate(site_perc = (n_sites/total_sites)*100)%>% #calculate % sites w/ high elevation zapada
pivot_longer(!yr, names_to = "measure", values_to = "value")%>% #pivot longer for plotting
filter(!measure == "n_sites", !measure == "total_sites")
Visualization:
ggplot(zapHi_yrs_pivot, aes(x = yr, y = value, color = measure))+
geom_point(size = 0.5)+
geom_smooth(size = 0.5)+
geom_smooth(se = F, method = "lm", linetype = "dashed")+
theme_classic()+
facet_wrap(~measure, scales = "free")+
labs(x = "Sample Date", y = "")+
theme(legend.position = "none")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
LM for density:
zapHi.dens.lm <- lm(density_xbar ~ yr, zapHi_yrs)
summary(zapHi.dens.lm)
##
## Call:
## lm(formula = density_xbar ~ yr, data = zapHi_yrs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -548.87 -196.70 -33.94 173.37 938.63
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 86019.54 43592.81 1.973 0.0891 .
## yr -42.67 21.70 -1.966 0.0900 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 465.5 on 7 degrees of freedom
## Multiple R-squared: 0.3558, Adjusted R-squared: 0.2637
## F-statistic: 3.866 on 1 and 7 DF, p-value: 0.09001
No Significant trends.
First, get map data for Utah:
utah <- map_data("state", region = "utah") #state polygon
ut.counties <- map_data("county", region = "utah") #county polygons
Make map of Utah with the 10 highest Zapada sites marked:
zap.map <- ggmap(get_map(c(-111.950684,39.419220), source = "stamen", zoom = 7, maptype = "terrain-background"))+ #basemap: terrain background (no labels) from google maps platform console, centered on UT centroid
geom_polygon(data = ut.counties, aes(x = long, y = lat, group = group), color = "black", fill = "white", size = 0.5, alpha = 0.0, linetype = "dotted")+ #add counties to map, use dotted lines
geom_polygon(data = utah, aes(x = long, y = lat, group = group), color = "black", fill = "white", alpha = 0.0)+ #add state outline to map, use solid line
geom_point(data = zap_hiSites, aes(x = siteLongitude, y = siteLatitude, color = elev_m, group = NULL), size = 4, pch = 20)+ #add zapada sites, color code by elevation.
labs(color = "Site elevation (m)", title = "10 highest sites with Zapada")+ #updating legend title
annotate(geom = "text", x = -111.8, y = 39, label = "Utah", size = 6, fontface = 4, color = "black", alpha = 0.75)+ #add "Utah" label to center of map
scale_color_gradient(low = "#DCB0FC", high = "#7004BF")+ #setting color gradient for site colors
theme_void()+ #removing lat/long from x and y axes
theme(legend.title = element_text(size = 10, face = "bold"), #adjusting legend title/text/plot title size, face, and position
legend.text = element_text(size = 10),
legend.position = "right",
plot.title = element_text(size = 12, face = "bold", hjust = 0.5)
)
## ℹ <]8;;https://maps.googleapis.com/maps/api/staticmap?center=39.41922,-111.950684&zoom=7&size=640x640&scale=2&maptype=terrain&key=xxx-cc_ziVfJIReObeHLX39pYWWq8https://maps.googleapis.com/maps/api/staticmap?center=39.41922,-111.950684&zoom=7&size=640x640&scale=2&maptype=terrain&key=xxx-cc_ziVfJIReObeHLX39pYWWq8]8;;>
## ℹ Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
zap.map
Save this plot:
ggsave("plots/zapada_map.pdf", zap.map, device = "pdf", units = "in", width = 6.5, height = 4.5, dpi = "retina")
Try making a map color coded by site trend?
First, need to calculate total density in each sample:
total_dens <- taxa_densities%>%
group_by(sampleId)%>% # group by sample ID
summarise(labSplit = mean(labSplit), #Add sample lab split, field split, and area
fieldSplit = mean(fieldSplit),
tot_dens = sum(taxa_density), # calculate total density in sample
area = mean(area))%>%
mutate(sample_prop = labSplit*fieldSplit) #calculate sample proportion (labSplit*fieldSplit)
Basic plot:
dens.labsplit <- ggplot(filter(total_dens, !labSplit == 1), aes(x = labSplit, y = tot_dens))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
labs(y = "Total Sample Density", x = "")
dens.labsplit
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Log transformed plot:
dens.labsplit.log <- ggplot(total_dens, aes(x = labSplit, y = log10(tot_dens)))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
annotate(geom = "label", x = 0.5, y = 0.2, label = "P <0.001; R2 = 0.55")+
theme_classic()+
labs(y = "log10(Total Sample Density)", x = "Lab Split Proportion")
dens.labsplit.log
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
dens.ls.lm <- lm(log10(tot_dens)~labSplit, total_dens)
summary(dens.ls.lm)
##
## Call:
## lm(formula = log10(tot_dens) ~ labSplit, data = total_dens)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.71463 -0.26410 -0.05217 0.29127 1.67527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.17922 0.01133 368.97 <2e-16 ***
## labSplit -1.47447 0.01710 -86.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5062 on 6112 degrees of freedom
## Multiple R-squared: 0.5489, Adjusted R-squared: 0.5488
## F-statistic: 7436 on 1 and 6112 DF, p-value: < 2.2e-16
Significant negative relationship between lab split and total sample density.
Basic plot:
dens.sampleprop <- ggplot(total_dens, aes(x = sample_prop, y = tot_dens))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
labs(y = "", x = "")
dens.sampleprop
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Log transformed plot:
dens.sampleprop.log <- ggplot(total_dens, aes(x = sample_prop, y = log10(tot_dens)))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
annotate(geom = "label", x = 0.5, y = 0.2, label = "P <0.001; R2 = 0.55")+
theme_classic()+
labs(y = "", x = "Sample Proportion")
dens.sampleprop.log
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
dens.sp.lm <- lm(log10(tot_dens)~sample_prop, total_dens)
summary(dens.sp.lm)
##
## Call:
## lm(formula = log10(tot_dens) ~ sample_prop, data = total_dens)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.71430 -0.26338 -0.05351 0.29128 1.67559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.17803 0.01131 369.27 <2e-16 ***
## sample_prop -1.47361 0.01709 -86.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5062 on 6112 degrees of freedom
## Multiple R-squared: 0.5489, Adjusted R-squared: 0.5488
## F-statistic: 7438 on 1 and 6112 DF, p-value: < 2.2e-16
Significant negative relationship between sample proportion and total sample density.
Basic plot:
dens.area <- ggplot(filter(total_dens, area <50), aes(x = area, y = tot_dens))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed")+
theme_classic()+
labs(y = "", x = "")
dens.area
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Log transformed plot:
dens.area.log <- ggplot(filter(total_dens, area <50), aes(x = area, y = log10(tot_dens)))+
geom_point(size = 0.5, alpha = 0.5)+
geom_smooth(size = 0.5, color = "Red", linetype = "dashed", method = "lm")+
annotate(geom = "label", x = 1.75, y = 0.1, label = "P <0.001; R2 = 0.10")+
theme_classic()+
labs(y = "", x = "Sample area, m2")
dens.area.log
## `geom_smooth()` using formula 'y ~ x'
Basic LM:
dens.ar.lm <- lm(log10(tot_dens)~area, filter(total_dens, area <50))
summary(dens.ar.lm)
##
## Call:
## lm(formula = log10(tot_dens) ~ area, data = filter(total_dens,
## area < 50))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1364 -0.3778 0.1454 0.5158 1.4501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.73072 0.01646 226.61 <2e-16 ***
## area -0.62620 0.02428 -25.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7157 on 6112 degrees of freedom
## Multiple R-squared: 0.09816, Adjusted R-squared: 0.09802
## F-statistic: 665.3 on 1 and 6112 DF, p-value: < 2.2e-16
Significant negative relationship between sample proportion and total sample density.
dens.samplemethod <- (dens.labsplit/dens.labsplit.log)|(dens.sampleprop/dens.sampleprop.log)|(dens.area/dens.area.log)
dens.samplemethod
Save:
ggsave("plots/dens_samplemethod.pdf", dens.samplemethod, device = "pdf", units = "in", height = 6.5, width = 9, dpi = "retina")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula 'y ~ x'
log10(Density) has a significant negative correlation with lab split, sample proportion (lab split * field split), and area. The R2 for lab split and sample proportion are fairly hight (0.55), while the R2 for area is very low (0.1).